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1 Introduction 

Concentration of measure is a well-studied phenomenon, and in the past 30 years or so it has been 
explored through a wide array of tools and techniques; JU^JE^l offer broad introductions. Results 
in this area are equally well motivated by theoretical questions (in areas such as geometry, functional 
analysis and probability), as by numerous applications in different fields including the analysis of 
algorithms, mathematical physics and empirical processes in statistics. 

From the probabilistic point of view, measure concentration describes situations where a random 
variable is strongly concentrated around a particular value. This is typically quantified by the rate 
of decay of the probability that the random variable deviates from that value (usually its mean or 
median) by a certain amount. As a simple concrete example consider a function f(W) of a Poisson(A) 
random variable W; if / : Z+ -» Mis 1-Lipschitz, i.e., \f(k)-f(k+l)\ < 1 for all k € Z + = {0,1,2,...}, 
then jU, 

Pr{f(W) - E[f(W)] > t} < exp { - ~ log (l + ^) }. (1) 

Although the distribution of fiW) may be quite complex, (^Q) provides a simple, explicit bound on 
the probability that it deviates from its mean by an amount t. This is a general theme: Under 
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appropriate conditions, it is possible to derive useful, accurate bounds of this type for a large class of 
random variables with complex and often only partially known distributions. We also note that the 
consideration of Lipschitz functions is motivated by applications, but it is also related to fundamental 
concentration properties captured by isoperimetric inequalities 

The bound (0) was established in |2j using the so-called "entropy method," pioneered by Ledoux 
[H]^ni^J. The entropy method consists of two steps. First, a (possibly modified) logarithmic-Sobolev 
inequality is established for the distribution of interest. Recall that, for an arbitrary probability 
measure fi and any nonnegative function / on the same space, the entropy functional Ent«(/) is 
defined by 

EnV(/) = / / log fdn - (I fdn) log(/ fdy), 

whenever all the above integrals exist. In the case of the Poisson, Bobkov and Ledoux |2j proved the 
following modified log-Sobolev inequality: Writing for the Poisson(A) measure, for any function 
/ : Z + — > R with positive values, 

Entp A (/)<A£p A [j\Df\ 2 

where Df(k) = f(k + 1) — f(k), k > 0, is the discrete gradient, and denotes the expectation 
operator with respect to a measure \i. In fact, they also established the following sharper bound 
which we will use below; for any function / on Z + , 

Entp A (e / ) < A£ P Je / {|L)/|e |D/l - e |D/l + l}l . (2) 

The second step in the entropy method is the so-called Herbst argument: Starting from some 
Lipschitz function /, the idea is to use the modified log-Sobolev inequality to obtain an upper bound 
on the entropy of e r ^, and from that to deduce a differential inequality for the moment-generating 
function G(r) = E[e T f] of /. Then, solving the differential inequality yields an upper bound on G(t), 
and this leads to a concentration bound via Markov's inequality. 

Our main goal in this work is to carry out a similar program for an arbitrary compound Poisson 
measure on Z+. Recall that for any A > and any probability measure Q on the natural numbers 
N = {1, 2, . . .}, the compound Poisson distribution CP(A, Q) is the distribution of the random sum 

w 
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where W ~ Poisson(A) and the Xj are independent random variables with distribution Q on N, 
also independent of W; we denote the CP(A,Q) measure by CPa,q. The class of compound Poisson 
distributions is much richer than the one-dimensional Poisson family. In particular, the CP(X,Q) 
law inherits its tail behavior from Q: CP(A, Q) has finite variance iff Q does, it has exponentially 
decaying tails iff Q does, and so on 13 . It is in part from this versatility of tail behavior that the 
compound Poisson distribution draws its importance in many applications. Alternatively, CP(A, Q) is 
characterized as the infinite divisible law without a Gaussian component and with Levy measure XQ. 

From the above discussion we observe that the Herbst argument is heavily dependent on the use of 
moment-generating functions, a fact which implicitly assumes the existence of exponential moments. 
Our main contribution is a modification of the Herbst argument for the case when the random variables 
of interest do not satisfy such exponential integrability conditions. We derive what appear to be 
perhaps the first concentration inequalities for a class of infinitely divisible random variables that 
have finite variance but do not have finite exponential moments. Apart from the derivation of the 
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present results, the modified Herbst argument is applicable in a variety of other cases and may be of 
independent interest. In particular, this approach can be applied to prove dimension-free inequalities 
for compound Poisson vectors, as well as power-law concentration bounds for more general infinitely 
divisible laws. 

Our starting point is the following modified log-Sobolev inequality for the compound Poisson 
measure CP\ q. 

Theorem 1. [Modified Log-Sobolev Inequality for Compound Poisson Measures] For 
any A > 0, any probability measure Q on N and any bounded / : Z + — > R, 

Ent CPx , Q (e/) <\J2QiE CP ,, Q [e f {\Djf\e\ Dj f\-e\ Dj f\ + l}], (3) 
i>i 

where D^f(k) = f(k + j) - f(k), for j, k € Z+. 

This can be derived easily from |15| Cor 4.2] of Wu, which was established using elaborate stochastic 
calculus techniques. In Section 3 we also give an alternative, elementary proof, by tensorizing the 
Bobkov-Ledoux result (j2J. Note the elegant similarity between the bounds in (j2j) and (fHjl. 

We then apply our modified Herbst argument to establish concentration bounds for CP(A,Q) 
measures under various assumptions on the tail behavior of Q. These are stated in Section 2 and 
proved in Section 4. For example, we establish the following polynomial concentration result. Recall 
that a function / : Z+ -> R is if-Lipschitz if \ f(j + 1) - f(j)\ < K for all j G Z+. 

Corollary 2. [Polynomial Concentration] Suppose that Z has CP(A, Q) distribution where Q 
has finite moments up to order L, 

L = sup {t > 1 : J2j>if Qj < oo} > 1, 

and write q r for its integer moments, 

If / : Z + — > R is ET-Lipschitz, then for any positive integer n < L and any t > we have, 

Pr{\ f (Z) -E[f(Z)]\>t}< A- B n -t- n , 
where for the constants A, B we can take, 

A = exp|A^ ( U ]q r K r - An log if} 
B = 2|/(0)| + 2ifAgi + l. 

Various stronger and more general results are given in Section 2. There, at the price of more 
complex constants, we get bounds which, for large t, are of (the optimal) order t" L+& for any 5 > 0. 
Moreover, since the only property of the compound Poisson distribution used in the proof is that it 
satisfies the functional inequality of Theorem 1, similar bounds are immediately seen to hold for any 
measure that satisfies such an inequality. Note that although the bound of Corollary 2 is not useful 
for small t, it is in general impossible to obtain meaningful results for arbitrary t > 0. For example, if 
/ is the identity function and Z ~ Poisson(A) where A is of the form m + 1/2 for an integer m, then 
\Z — E(Z)\ > 1/2 with probability 1; a more detailed discussion is given in Section 2. 
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As noted above, these appear to be some of the first non- exponential concentration bounds that 
have been derived, with the few recent exceptions discussed next. Of the extensive current literature 
on concentration, our results are most closely related to the work of Houdre and his co-authors. Using 
sophisticated technical tools derived from the "covariance representations" developed in |7j |5j , Houdre 
5 a obtained concentration bounds for Lipschitz functions of infinitely divisible random vectors with 
finite exponential moments. In jB], truncation and explicit computations were used to extend these 
results to the class of stable laws on M. d , and the preprint [3] extends them further to a large class 
of functionals on Poisson space. To our knowledge, the results in [B][3] are the only concentration 
bounds with power-law decay to date. But when specialized to scalar random variables they only 
apply to distributions with infinite variance, whereas our results hold for compound Poisson random 
variables with a finite Lth moment for any L > 1. Although the methods of jl>l[ll as well as the form 
of the results themselves are very different from those derived here, some more detailed comparisons 
are possible as outlined in Section 2. Finally, the recent paper ,3 contains a different extension of the 
Herbst argument to certain situations where exponential moments do not exist. The focus there is on 
moment inequalities for functions of independent random variables, primarily motivated by statistical 
applications. 



2 Concentration Bounds 

The following result is the main motivation for this paper. It illustrates the potential for using the 
Herbst argument even in cases where the existence of exponential moments fails or cannot be assumed. 

Theorem 3. [Power-law Concentration] Suppose that Z has CP(X,Q) distribution where Q 
has finite moments up to order L, 

L = sup {t > 1 : J2j>if Qj < oo} > 1, 

and write qi = ^j>iJ Qj for its first moment. 

(i) If / : Z+ — > R is ET-Lipschitz, then for any t > and e > we have, 

Pr{|/(Z) - Ef(Z)\ >t}< exp { Q< inf ^ [l e (a) + a log ( 2|/(0)| + 2*TA gl + ^ j (4) 



where 



I £ (a) = A^Q^C^-l-alogC^} 
jK 



C j>6 = l + J —. 

e 



(ii) The upper bound @ is meaningful (less than 1) iff t > T := 2|/(0)| + 2K\q\ + e, and then, 
Pr{\f(Z)-Ef(Z)\>t}<expl- J i - 1 (s)dsj 
where i e (a) := I' e {a) = A £,>i Qj[Cf e - 1] log C j>e . 
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Remarks. 

1. Taking a = L — 5 for any 5 > in the exponent of (jlj), we get a bound on the tails of f(Z) of 
order t~( L ~^ for large t. By considering the case where / is the identity function f(k) = k, k £ Z + , 
we see that this power-law behavior is in fact optimal. In particular, this shows that the tail of the 
CP(A,<5) law decays like the tail of Q, giving a quantitative version of a classical result from |13| . 

2. As will become evident from the proof, Theorem 3 holds for any random variable Z with law \x 
instead of CPa,q, as long as [i satisfies the log-Sobolev inequality of Theorem 1 with respect to some 
probability measure Q on N and some A > 0, and assuming that fi has finite moments up to order L. 
The bound (jU remains exactly the same, except that the first moment M± = E[Z] of /j, replaces Xq\. 

3. Integrability properties follow immediately from the theorem: For any i\~-Lipschitz function /, 
Eqp x q[\f\ T ] < °° for all r < L, and the same holds for any law \x as in the previous remark. 

Since the support of CP(A, Q) is Z+, we would naturally expect the range of / to be highly dis- 
connected. Therefore, to somewhat simplify the expression in the exponent of Q next we concentrate 
on the (typical) class of functions / : Z + — > R whose mean under CPa,q is not in the range of /: 

Corollary 4. [POWER-LAW Concentration FOR Nice /] Suppose that Z has CP(A, Q) distribution 
where Q has finite moments up to order L > 1, and write q\ for its first moment. If / : Z + — > R is 
i\"-Lipschitz and there exists e > such that 

\f(j)-E[f(Z)]\>e, for all j e Z+, 

then for any t > we have, 



Pr (|/(Z) - Ef(Z)\ >t}< exp j o< inf £ [l e (a) +a\og{D/t) 



(5) 



where I e (a) is defined as in Theorem 3, and D := E\f(Z) — E[f(Z)]\. 
Remarks. 

4. Similarly to Theorem 3, this corollary gives quantitative bounds on the tail of f(Z) of the order 
of t~( L ~ s ^ for any 5 > 0. Also, the same result holds for any law [i as in Remark 2. 

5. The exponent in @ becomes negative exactly when t > D, for the same reasons as in Theorem 2. 
On the other hand, it is obvious that any bound can only be useful for t > Dq := min/ cg 2 + \f(k) — 
E[f(Z)]\, since the probability that \f(Z) — E[f(Z)]\ > Dq is equal to one. Moreover, D and -Do 
coincide in many special cases, as, e.g., when the range of / is a lattice in R and its mean E[f(Z)] is 
on the midpoint between two lattice points. In this sense, the restriction t > D is quite natural. 

6. The expression 2|/(0)| + 2K\q\ in Theorem 3 is simply an upper bound to the constant 
D = E\f(Z) — E[f(Z)]\ appearing in Corollary 4. In both cases, when L > 2 it is possible to obtain 
potentially sharper results by bounding D above using Jensen's inequality by, 

[{K 2 Xq 2 + {\f(0)\+KXq 1 } 2 ] i , 

where q 2 is the second moment of Q. Similar expressions can be derived in the case of higher moments. 

7. The most closely related results to our power-law concentration bounds appear to be in the 
recent preprint jlj. 1 The relevant bounds in 0j specialized to Lipschitz functions of CP(A,<5) random 
variables require that the probability measure Q be non-atomic, which excludes all the cases we 

1 The results in [1] are stated in the much more general setting of functionals on an abstract Poisson space. Using the 
Wiener-Ito decomposition, any infinitely divisible random variable can be represented as a Poisson stochastic integral, 
which in turn can be realized as a "nice" functional on Poisson space. 
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consider. But shortly after the first writing of this paper C. Houdre in a personal communication 
informed us that this assumption can be removed by an appropriate construction. The details have 
not been checked by us, but in the following comparison we assume that it does not change the 
statements in jl]. The main assumptions in [I] are that the random variable of interest has infinite 
variance, and also certain growth conditions. Because of the infinite-variance assumption, the majority 
of the results in this paper (corresponding to L > 2) apply to cases that are not covered in 0]. As 
for the growth conditions, they are convenient to check in several important special classes, e.g., for 
a-stable laws on R, but they can be unwieldy in the compound Poisson case, especially as they depend 
on Q in an intricate way. On the other hand, if Q has infinite variance, 0[ Cor. 5.3] gives optimal-order 
bounds, including the case when Q has infinite mean, for which our results do not apply. 

Next we show how the Herbst argument can be used to recover precisely a result of [Sj in the case 
when we have exponential moments. 

Theorem 5. [Exponential Concentration] |Sj Suppose that Z has CP(X,Q) distribution where 
Q has finite exponential moments up to order M, 

M = sup {r > : J2j>ie Tj Qj < oo} > 0. 
If / : Zjf- — > K is -fT-Lipschitz, then for any t > we have, 

Pv{f(Z)-Ef(Z)>t} < exp | inf [H(a)-at}\ = exp [ - / h^&ds), (6) 

I- 0<a<M/K > V Jq i 

where H(a) := XJ2j>i Qj[e aK ^ — 1 — aKj], and hr x is the inverse of h(a) := H'{a). 
Remarks. 

8. Theorem 1 of [SJ gives concentration bounds for a class of infinitely divisible laws with finite 
exponential moments, and in the compound Poisson case it reduces precisely to (jHJ), which also applies 
to any random variable Z whose law satisfies the result of Theorem 1. It is also interesting to note 
that Theorem 5 can be derived by applying [151 Prop 3.2] to a compound Poisson random variable 
(constructed via the Wiener-Ito decomposition), and then using Markov's inequality. 

9. Theorems 3 and 5 easily generalize to Holder continuous functions. In the discrete setting of 
Z+, / is ET-Lipschitz iff it is Holder continuous for every exponent (5 > 1 with the same constant K. 
But if / is Holder continuous with exponent (3 < 1, this more stringent requirement makes it possible 
to strengthen Theorem 3 and Theorem 5, by respectively redefining, Cj j£ = 1 + and 

H(a) = A £ Qj [e aK ^ - 1 - aKj 15 ] . 

10. While all our power-law results dealt with two-sided deviations, the bound in Theorem 5 is one- 
sided. The reason for this discrepancy is that the last step in all the relevant proofs is an application 
of Markov's inequality, which leads us to restrict attention to nonnegative random variables. When 
exponential moments exist, the natural consideration of the exponential of the random variable takes 
care of this issue, but in the case of regular moments we are forced to take absolute values. 

3 Proof of Theorem 1 

An alternative representation for the law of a CP(A,Q) random variable Z is in terms of the series 

oo 

Z = j , Yj ~ Poisson(AQ i ) , (7) 
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where the Yj are independent. 

For each n, let \i n denote the joint (product) distribution of (Y\, . . . ,Y n ). In this instance, the 
tensorization property of the entropy ^01 jH] can be expressed as 



EnV n (G) < J2 E [ Ent P^ • • • , ^-i, ■ ■ ■ > 

i=i 



(8) 



where G : V\ — > is an arbitrary function, and the entropy on the right-hand side is applied to the 
restriction Gj of G to its jth co-ordinate. Now given an / as in the statement of the theorem, define 
the functions G : Z™ -> R + and H : Z™ -> M+ by 



i/( yi ,...,y n ) = /(^% fc ), y?eZ^, 



fc=i 



and G = e H . Let ^ n denote the distribution of the sum S n := X^fc=i ^^fe an d write H 



for 



the restriction of H to the variable yj with the remaining y^s fixed. Applying (jSJ) to G we obtain, 

n 

Ent^ef) = Ent M „(G) < £ #[Entp AQ . (g^, - . . , ^_ Xj •, Y i+1 , . . . , Y n 

3=1 
n 

= £>[Entp AQ . (^(n,...,^,,^,...,^ 



Using the Bobkov-Ledoux inequality (J2J) to bound each term in the above sum, and noting that, 
trivially, DHj(y u ...,y n ) = -D J /(ELi k Vk), 



AE^k{i^/i eiDj/i - eiD3/i+i } 



(9) 



where the last inequality follows from the fact that xe x — e x + 1 > for x > 0. 

Finally, we want to take the limit as n — > oo in 0. Since /2„ CPa,q as n — > oo by (0), and since 
/ is bounded, by bounded convergence 



Ent An (e / ) -> Entcp A , Q ( e/ )) ra 00 



.A 



(10) 



Similarly, changing the order of summation and expectation in the right-hand side of © by Fubini, 
taking n — > oo by bounded convergence, and interchanging the order again, it converges to 

\Y,^Ecv x , Q [e f {\D^\e\ Di n-e\ D3 n + l} 



This together with (|10jl implies that © yields the required result upon taking n — > oo. 



□ 
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4 Concentration Proofs 



For notational convenience we define the function rj(x) := xe x — e x : + 1, x G R, and note that it is 
non-negative; it achieves its minimum at 0; it is strictly convex on (— l,oo) and strictly concave on 
(-co, —1); it decreases from 1 to as x increases to zero, and it is increasing to infinity for x > 0. 

The main technical ingredient of the paper is the following proposition, which is based on a 
modification of the Herbst argument. 

Proposition 7. Suppose that Z has CP(A, Q) distribution where Q has finite moments up to order 
L > 1. If f : Z + — > R is bounded and ivT-Lipschitz, then for t > 0, e > and a € (0, L), we have, 

Pr{|/(Z) - Ef(Z)\ >t}< exp [l e (a) + aE[\ogg t (Z)\ - a log t}, 

where I e (a) is defined as in Theorem 3 and 

9e(x) := \f(x) -E[f{Z)}\I { \ f(x) _ E[f{z)] \> e} + eI { \ f{x y E[f(z)] \ <t} . 

Proof of Proposition 7. Since / is bounded, by its definition g t is also bounded above by 
2||/||oo + e an d below by e. Therefore, the moment generating function G(t) := E[g e (Z) T ] is well- 
defined for all t > 0. Moreover, since both g e and logg e are bounded, dominated convergence justifies 
the following differentiation under the integral, 



G'(t) = E 



d 



3 rlog 9e (Z) 



E[g e {ZY log g e {Z) 



so we can relate G(r) to the entropy of g\ 



Ent C P A , Q (5l) = tG'(t) - G(r)logG(r) = r 2 G(r) 



log G(r) 



(11) 



In order to bound this entropy we will apply Theorem 1 to the function <j){x) := r\ogg e {x). First 
we observe that g e can be written as the composition g t = h o (/ — E[f(Z)]), where it is easy to 
verify that the function h(x) := |x|I{| x |> £ } + eI{U[<e} i s 1-Lipschitz. And since / is /f-Lipschitz by 
assumption, g £ is itself X-Lipschitz. Hence we can bound D J 4> as 



D J (p(x) = t log 



9e{x + j) 



9e(x) 



< r log 1 + 



Dig e (x) 



9e(x) 



< rlog 1 + 



jK 



r log C j} , 



The same argument also yields a corresponding lower bound, so that \D J (p(x)\ < r\ogCj :t . Applying 
Theorem 1 to gives, 

Ent C P A , Q G7e r ) =Ent C P AiQ (e^) < X ^ QjE C p XQ [e^r](\D j cp\)] < XG(t ) ^ Qjr](r log Cj >€ ), 

since r/(x) is increasing for x > 0. Combining this with (|1 If) we obtain the following differential 
inequality valid for all r > 0: 



d 

d^ 



logG(r) 



< 



7](t log C j>£ 
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To solve, we integrate with respect to r on (0, a] to obtain, for any a < L, 



logG(a) 



a 



E[logg t (Z)] < X^Qj 



3>l 



77(rlogCj, £ ) 



A^logC^ / 



A ^ Qj log C ii£ 
I e (a)/a, 



alogCj^ 



a log Ct 



or, equivalently, 



G(a) < exp{aE[logg e (Z)]+I e (a)}, 



(12) 



where the exchange of sum and integral is justified by Fubini's theorem since all the quantities involved 
are nonnegative. To complete the proof we observe that g t > \f — E[f(Z)]\, so that by ()12j) and an 
application of Markov's inequality, 

Pv{\f(Z)-E[f(Z)]\>t} < Pv{g e (Z)>t} 

= Pr{g e (Z) a > t a ) 
< r a -G{a) 



< exp < I e (a) + aE[\ogg e (Z)] — a log t 



□ 



Using Proposition 7 we can prove our main results, Theorem 3 and Corollaries 2 and 4. 

Proof of Theorem 3. The first step is to bring the upper bound in Proposition 7 into a more 
tractable form. Observe that by its definition, g e (x) < \f(x) — E[f(Z)] \ + e, so that, by Jensen's 
inequality, for a function / satisfying the hypotheses of Proposition 7, 



E[logg e (Z)] < log E[g e (Z)] < log [E{\f(Z)-E[f(Z)}\}+e 
Thus the upper bound in Proposition 7 can be weakened to 

Pr{\f(Z)-E[f(Z)]\ >t} <exp|l e (a)+alog [D + e) - alogij, 



(13) 



(14) 



where D := E{\f{Z) — E[f{Z)]\}. Next we use the Lipschitz property of / to obtain an upper bound 
for the above exponent which is uniform over all / with /(0) fixed. Since f(j) € [/(0) — Kj, f(G)+Kj], 
we have \f(j)\ < |/(0)| + Kj, and hence 

D<2E\f(Z)\<2\f(0)\+2K\q u 

where we used the fact that the mean of the CP(A, Q) law is Xq%. Substituting in (|14j) and taking 
the infimum over a yields the required result Q, and it only remains to remove the boundedness 
assumption on /. But since the bound itself only depends on / via /(0) and K, truncating / at level 
±n and passing to the limit n — > oo proves part (a). 
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With T = 2|/(0)| + 2K\q\ + e, in order to evaluate the exponent 

inf \l e (a) + a\og(T/t)] (15) 

in (|1J). we calculate the first two derivatives of I e (a) with respect to a as, 

£(a) = \Y,QilC?,e ~ l]logCi,e and 7f(a) = A £ Q.-C^ (log C h e ) 2 , 

where the exchange of differentiation and expectation is justified by dominated convergence; observe 
that, since Cj jt > 1, both are positive for all a > 0. In particular, since I t {a) > 0, the exponent (|15|) 
can only be negative (equivalently, the bound in (jlj) can only be less than 1) if the second term in ()15|) 
is negative, i.e., if t > T. On the other hand, since I' e (0) = and I' t '(a) > for all a, we see that I e (a) 
is locally quadratic around a = 0. This means that, as long as t > T, choosing a sufficiently small we 
can make (|15|) negative, therefore the bound of the theorem is meaningful precisely when t > T. 

To obtain the alternative representation, fix any e > and set i e (a) = I' e (a). Since I' t '(ct) is strictly 
positive, for t > T the expression I e (a) + alog(T/t) is uniquely minimized at a* > which solves 
i e (a) = log(t/T) > 0. Hence, for all t > T, integrating by parts, 



mm 

0<a<L 



I £ (a)+alog{T/t) = I £ (a*)+a*log(T/t) 

i e (s)ds + a* log(T/t) 

i e (a*) 



[ ' xdi~ l {x) +a*log(T/t) 
Jo 

j-iAa*) 

i e {a*)i~ l (i e (a*)) - / i- 1 {x)dx + a* log(T/i) 
Jo 

io g (t/r) 

i e 1 (x)dx, 

which proves part (b). □ 

Proof of Corollary 4. The proof is identical to that of Theorem 3, with the only difference 
that, since here we simply have g t (x) = \ f(x) — E[f(Z)]\ for all x, we can replace the bound 1)13(1 by 
E[\ogg e (Z)] < logD, where D = E\f(Z) — E[f(Z)]\. Proceeding as before gives the result. □ 

Proof of Corollary 2. This is an application of Theorem 3 for specific values of a and e: 
Bounding the infimum by the value at a = n and taking e = 1, 

Pr {\f{Z) - Ef(Z)\ >t}< exp jl^n) + nlog ( 2|/(0)l + f^ 1 + * ) ] }. (16) 

Using the binomial theorem to expand I\(n), 

h{n) = A^Q^C^-l-nlogC^} = A ^ Qj { (1 +jK) n - 1} - An J^Qj ^(1 +jK) 
j>i j>i j>i 

< 



x H Qj E W {jK ^ r - Xn E Qj t lo g * + lQ g R ] 

j>l r=l j>l 

< \J2(^jK r q r -XnlogK. 



r=l 

Substituting this bound into (fTfijl and rearranging yields the result. □ 
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Next we go on to prove the exponential concentration result Theorem 5 using the classical Herbst 
argument in conjunction with the modified log-Sobolev inequality of Theorem 1. 

Proof of Theorem 5. We proceed similarly to the proof of Proposition 7. Assume / is a bounded 
and -ftT-Lipschitz, and let F(t) = E[exp{r f (Z)}], r > be the moment-generating function of f(Z). 
Dominated convergence justifies the differentiation 

F'(r) = E[f(ZyK% 

so we can relate F'(r) to the entropy of e T f by 

Ent C P A , Q (e T/ ) = tF'(t) - F(r) log F(r) = t 2 F(t)-^ 



log F(t) 



(17) 



Since / is iT-Lipschitz, the function g := rf is T-ftT-Lipschitz, so that \D 3 g\ < rKj. Applying 
Theorem 1 to g, 

EntGP Ai( ,(e T = EntcP AiO (e») < A £ QjE [e^ V (\D^ g(Z)\)] < AF(t)£Q 3 - V (jrK). 

i>i j>i 



Combining this with (|17j) yields 

d_ 

7h 



logF(r) 



< 



3>1 



(t}(JtK) 



and integrating with respect to r from to a > we obtain 



IogF(a) 



E[f(Z)} < A / £Q 



7 >i ^ 



T 
jaK 



ri(s) 



dr 



ds 



e i^a _ 1 _ 



where the exchange of the sum and integral is justified by Fubini's theorem since the integrand is 
nonnegative. Therefore, we have the following a bound on the moment-generating function F, 



F(a) <exp{aE[f(Z)] + H(a)}, a > 0, 
where H(a) = A ^ • Qj [ei Ka — 1 — jK a] . An application of Markov's inequality now gives 

Pr{/(Z)-E[/(Z)]>t} < e- at E\exp{a[f(Z)-E[f(Z)})}} 



(18) 



-erf 



F(a)e- aS [^ z )] 
< exp {H(a) — at}. 

The removal of the boundedness assumption is a routine truncation argument as in the proof of 
Theorem 3 or in j2] 0- In order to obtain the best bound for the deviation probability, we minimize the 
exponent over a € (0, M/K). This yields the first expression in Theorem 5; the second representation 
follows from a standard argument as in the last part of the proof of Theorem 3 or [3] . □ 
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